Mixed-membership models of scientific publications.
نویسندگان
چکیده
PNAS is one of world's most cited multidisciplinary scientific journals. The PNAS official classification structure of subjects is reflected in topic labels submitted by the authors of articles, largely related to traditionally established disciplines. These include broad field classifications into physical sciences, biological sciences, social sciences, and further subtopic classifications within the fields. Focusing on biological sciences, we explore an internal soft-classification structure of articles based only on semantic decompositions of abstracts and bibliographies and compare it with the formal discipline classifications. Our model assumes that there is a fixed number of internal categories, each characterized by multinomial distributions over words (in abstracts) and references (in bibliographies). Soft classification for each article is based on proportions of the article's content coming from each category. We discuss the appropriateness of the model for the PNAS database as well as other features of the data relevant to soft classification.
منابع مشابه
Discovery of Latent Patterns with Hierarchical Bayesian Mixed-Membership Models and the Issue of Model Choice
Model choice is a major methodological issue in the explosive growth of data-mining models involving latent structure for clustering and classification, especially because models often have different parameterizations and very different specifications and constraints. Here, we work from a general formulation of hierarchical Bayesian mixed-membership models and present several model specificatio...
متن کاملHierarchical Bayesian Mixed-Membership Models and Latent Pattern Discovery
Hierarchical Bayesian methods expanded markedly with the introduction of MCMC computation in the 1980s, and this was followed by the explosive growth of machine learning tools involving latent structure for clustering and classification. Nonetheless, model choice remains a major methodological issue, largely because competing models used in machine learning often have different parameterization...
متن کاملDiscovering Latent Patterns with Hierarchical Bayesian Mixed-Membership Models
There has been an explosive growth of data-mining models involving latent structure for clustering and classification. While having related objectives these models use different parameterizations and often very different specifications and constraints. Model choice is thus a major methodological issue and a crucial practical one for applications. In this paper, we work from a general formulatio...
متن کاملIntroduction to Mixed Membership Models and Methods
1.1 Historical Developments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.2 A General Formulation for Mixed Membership Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.3 Advantages of Mixed Membership Models in Applied Statistics . . . . . . . . . . . . . . . . . . ....
متن کاملBayesian Mixed Membership Models for Soft Classification
The paper describes and applies a fully Bayesian approach to soft classification using mixed membership models. Our model structure has assumptions on four levels: population, subject, latent variable, and sampling scheme. Population level assumptions describe the general structure of the population that is common to all subjects. Subject level assumptions specify the distribution of observable...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Proceedings of the National Academy of Sciences of the United States of America
دوره 101 Suppl 1 شماره
صفحات -
تاریخ انتشار 2004